Frequent Itemsets

Items

The set of all items sold in a market ...for e.g., in Walmart.

Itemsets

A subset of items.

Baskets

Individual sets of purchases made by customers.

Support for itemset I

The number of baskets containing all items in itemset I

Frequent Itemsets

Given a support threshold s, sets of items that appear in atleast s baskets are called frequent itemsets


In [2]:
# Finding frequent Itemsets - the naive method

items = {'m' , 'c', 'p', 'b' , 'j'} # abbreviated item names
support = 3 # the support threshold
baskets = [{'m','c','b'}, {'m','p','j'}, {'m','b'}, {'c','j'}, {'m','p','b'}, {'m','c','b','j'}, {'c','b','j'}, {'b','c'}]

# todo: code to find frequent itemsets

Association Rules

  • If-then rules about the contents of the baskets
  • $ {i_1, i_2, ...i_k} \to j $ means: "if a basket contains all of $ i_1, i_2,...,i_k $ then it is likely to contain j"
  • Confidence of this association rule is the probability of j given $ i_1, i_2, ...i_k $
    • That is, the fraction of the baskets with $ i_1, i_2, ...i_k $ that also contain j

In [6]:
# 100,000 items = 5 billion pairs
no_of_items = 100000
no_of_pairs = no_of_items * (no_of_items - 1) / 2 # n choose 2

print no_of_pairs

# 4 bytes per count = 20 gb of memory

memory_in_bytes = 4
total_memory_reqd_bytes = no_of_pairs * memory_in_bytes

# 1 GB = 1,000 MB = 1,000,000 KB = 1,000,000,000 Bytes 
total_memory_reqd_gb = total_memory_reqd_bytes / (1000 * 1000 * 1000)  

print 'Memory Required (GB):', total_memory_reqd_gb


4999950000
Memory Required (GB): 19

In [ ]: